DOCOHEHT BESQHS 



SO 199 293 

&aTBOB 
TITLE 

PUB DATE 
BOTE 

EDBS FBICE 
DESCBIPTOES 



IDEHTIFXEBS 



TM 810 244 

ory, John c. : valois, Bobert F- 

The Influence cf Negatively worded Scale Items on 

Overall Batlngs. 

[BO] 

13p. 

MFOI/PCOI Plus Postage- 

'•'Course Evaluation: Higher Education: Item Banks; 
♦Negative ^orms (Language) ; Eating Scales; *Student 
Evaluation of Teacher Performance; Test Format; *Test 
Items 

♦Instructor and Course EvaluatiOD System 



IBSTBJICT 

Tvo studies investigate whether the placement and /or 
wording (either positively or negatively) of diagnostic rating scale 
i±eas influenced student responses to the global items in the 
evalua-tion of a course of instruction. The Instructor and Course 
Evaluation System (ICES) developed at the University of Illinois, 
Orfaana-Chanpaign was used to conduct end-of-semester course 
evaluations. Thirty diagnostic items were selected from a catalog 
containing approximately 500 items. Half of the global items were 
abcat the course and half were about the instructor- Twenty of the 30 
itens were rewritten to create positively and negatively worded 
versions of each item- Three negative wording conditions were 
repeated on scales with the two global items appearing t-ither before 
or after the 30 diagnostic items- In two studies, a55 undergraduates 
fiere randomly administered one of six evaluation forms- Results of a 
2x3 analysis of variance with repeated measures indicated that the 
instructor was significantly higher rated than was the course- la 
neither study were the overall ratings of the instructor or course 
affected by the nega*:ive or positive wording of the diagnostic items. 
(BL) 



* Beproductions supplied by EDBS are the best that can be made * 

* ~ from the original document- * 

ERiC 



The Influence of Negatively Worded Scale Items 
on Overall Ratings 



SJ S OCPARTMENTOF HEALTH. 
EDUCATION AWCtFAftC 
NATIONAL INSTITUTE Of 
EDUCATION 

Tm.'. document MAS BEEN REPBO- 
DuCtD CmaCTiv as BPCEivED FROw 
TME Pf gSON OB OH&ANlZAT»ONORlGIN. 
ATiNO IT POi NTS Of view OR OPINIONS 
STATED 00 NOT NECESSARILY REPRE- 
SENT Off iCiAL NATIONAL INSTITUTE QF 
EDU'' *T'ON PO.SlT»ON Oe POLICY 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



John C. Ory 
Robert F. Valols 
University of Illinois 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



In his systcanuitlc study of response sets and their effects » Cronbach 
(1946t 1950) Identified acquleadence as a response tendency to favor affir- 
mative responses over negative responses. Couch and Kenlston (1960) later 
called this tendency **yea- or nay- saying/' vhereln respondents consistently 
select In one direction* either positive or negative. Their belief vas that 
somt individuals have a general disposition on the positive/negative contlnuun 
regardless of the CMitent of the Items. Consequently, the responses of these 
same Individuals may Indicate soaethlng other than that vhlch was Intended to 
be measured. 

To avoid response bias due to yea- or nay- saying, psychometrlcicns 
reconend counterbalancing the questions which vere asked, so that a positive 
response to one question and a negative response to another both contribute 
towards increasing the score on the measure as a whole (Lemon, 1973; Llkert, 
1932; Edwards, 1957). Llkert (1932) suggested that i^ese "two kinds of state- 
ments ou^ to be distributed throughout the attitude test in a chance or 
haphazard manner (p. 91).'' 

For rating scales used to evaluate a new project, person or course of 
instruction, the i^bove solution calls for the inclusion of both negatively 
and positively stated items about the object or person being evaluated. For 
example, course evaluation items should include items which make positive and 
negative statemsnts about the course, such as: 



ERIC 



2. 



This course provided an opportunity to learn from other 

students, (positive) 
Teaching methods used In this course were poorly chosen. 

(negative) 

Uhat has yet to be studied Is the possible affect of negatively worded 
Iteas oa raters* evaluations. Do negatively worded Items ''encourage" a more 
critical evaluation than do positively worded Items? Negatively worded Items 
may highlight the negative aspects or faults of the object or person being 
evaluated^ or may serve to un^msclously suggest to the rater particular 
problem areas anticipated by the evaluator. If so, rating scale evaluations 
may be affected as much by the wording of the Items as by the quality of the 
object or person being evaluated. 

The possible affect of Item sotting on overall ratings is particularly 
relevant to many of today's available student ratings Instruments. Most of the 
available instruments Include two kinds of scale items — global or generally 
stated Items and diagnostic or specifically stated items. Global items measure 
student evaluations of general areas of likstructlon» while diagnostic items 
measure student Judgments and observations of specific behaviors of the 
Instructor » instructional techniques and detailed student outcomes. The follow- 
ing examples of each type of item are included on the Instructor and Course 
Evaluation System (ICES) developed at the University of Illinois » Urbane* 
Champaign Otote 1). 

Global: Rate the course in general 



Excellent Poor 
Diagnostic: The Instructor motivated me to do my best work. 



Almost Almost 
Ahmys Never 

3 



-ERIC 



3- 

The working principle behind the ICES two^-vay claseification of Iteos 
(globel end dlegnoetlc) ie that different typee of items should be used for 
different purposes. The diagnostic items are best sxiited for the purpose of 
faculty iaprovenent while the global items are most useful for providing sum* 
matl^ Information needed for personnel decisions (Brandenburg^ Braskamp, and 
Ory, 1980). Faculty, therefore, select those diagnostic items they consider 
appropriate for their particular course* Each faculty quest: f'^rtiaire would also 
include three global items: Rate the course content. Rate tb^ instructor, and 
Rate the course in general. Normative data are provided for the latter items 
only so that campus-wide comparisons can be made. 

Dhfortonately, little is known about the relationship between student 
responses to faculty-^^lected diagnostic items and global evaluation items. It 
has yet to be determined if the type of diagnostic item chosen by a faculty 
member can Influence student responses on the global items. The purpose of 
these two studies was to investigate whether the placement and/or wording 
(either positively or negatively) of diagnostic rating scale items influenced 
student responses to the global items in che evaluation of a course of instruction* 

Instrumentatim 

The ICES system was used to conduct the end^-of-^semester course evaluations. 
ICES is a cafeteria-type student rating system that permits each instructor to 
select diagnostic items from a catalog containing approximately 500 items. As 
was stated earlier, the first three itens on all student questionnaires are 
global items. For purposes of this atudy, sttidents responded to only the last 
two global items—Rate the instructor and Rate the course in general. Respon* 
dents Indicated their rating on these two items on a S-point scale, with anchor 
points of "poor" («1) and "excellent** ("5). ICES questionnaires used in the 



ERIC 



4 



studies Included thirty diagnostic Items. Approximately half of the Items were 
about the course and half were about the Instructor. Twenty of the thirty 
diagnostic Items were rewritten to create a positively and negatively worded 
version of each item. For example. 

Positive version » Exams covered a reasonable amount of material. 

Negative version « Exams covered an unreasonable amount of material. 
In total, six evaluation forms were constructed containing 32 Items each. The 
content and design of the six forms is eiq>lained in Figure 1. As Illustrated, 
the three negative wording conditions (0/30, 10/30, 20/30) were repeated on 
scales with the two global items appearing either before or after the 30 
diagnostic items. It was believed that if the wording of the diagnostic items 
was to influence student responses to the global items such effect may be more 
noticeable if the global items were presented after rather than before the 
diagnostic items. 

Proportion of 
negatively worded 
diagnostic items Item Format 

Form 1: 0/30 Global before diagnostic items 

Form 2: 10/30 Global before diagnostic items 

Form 3: 20/30 Global before diagnostic items 

Form 4: 0/30 Diagnostic before global items 

Form 5: 10/30 Diagnostic before global items 

Form 6: 20/30 DlagiK>stlc before global items 

Figure 1: The content and design of the six evaluation forms. 

Study One 

Methods . During the last week of the 1980 Spring semeecer, 180 students 
enrolled in an undergraduate Introductory health education course taught at a 



EKLC 



5 



5. 



Mldvestern university were raodomly administered one of the six evaluation 
forms. Thirty st\idents completed each of the six forms. 

Data Analysis . Differences In student responses to the two global Items 
across the six e/aluatlon forms were analyzed through a 2 x 3 analysis of 
variance (ANOVA) with repeated measures. The global assessments of course 
and instructor were repeaced across the two p .acement conditions (global Items 
before or after diagnostic items) and the three wording conditions (0, 10, or 
20 of 30 items wbrded negatively). Resultant F-ratlos were tested at a .05 
level of significance. 

Results . Global item means and standard devlatloas recorded on each of 
the six evaluation forms are presented In Table 1. Results of the ANOVA pre- 
s^ted In Table 2 indicated that the Instructor (4.71) was significantly 
(p< .01) higher rated than was the course (4.39). Also significant (p < .01) 
was the Type of global rating X Placement interaction. Inspection of the inter- 
action cell means revealed that the overall ratings of the course were lower 
when the global items followed (4.22) the diagnostic items rather than preceded 
(4.52) them, whereas, the overall instructor ratings were approximately the 
same when presented either before (4.67) or after (4.74) the diagnostic items. 
While the lowest course and Instructor overall ratings were obtained when 20 
of the 30 diagnostic Items were written negatively, there were no significant 
(p< .14) differences identified for either global rating across the three 
wording conditions. 

Study Two 

Besults of Study One suggested that the placement more so than the wording 
of diagnostic scale items may influence student's responses to global items. 
However, limitations to the initial study prohibit a clear interpretation of 



EKLC 



6 



6. 



Table 1 

Sctidy One: Global Items Means and Standard 
Deviations Across the Six Evaluation Forms 



Placement Conditions 
Before Diagnostic Items After Diagnostic Items 



Hording 
Conditions 


Instructor 
Ratings 
X 


SD 


Course 
Ratings 
X 


SD 


Instructor 
Ratings 
X 


SD 


Course 
Ratings 
X 


SD 


(0/30 negatives) 


4.70 


.47 


4.73 


.45 


4.80 


.48 


4.23 


.63 


(10/30 negatives) 


4.67 


.84 


4.47 


.78 


4.72 


.68 


4.28 


.85 


(20/30 negatives) 


4.63 


.67 


4.37 


.67 


4.69 


.60 


4.17 


.93 



Table 2 

Study One: AMC.A Summary Table 



Source 


df 


SS 


MS 


F 


Placement (P) 


1 


.81 


.81 


1.11 


Nbrding (W) 


1 


.24 


.12 


.16 


P X W 


2 


.54 


.27 


.37 


Error 


174 


129.12 


.73 




Type of Rating (T) 


1 


9.48 


9.48 


40.26* 


T X P 


1 


2.90 


2.90 


12.30* 


T X W 


2 


.94 


.47 


1.99 


T X P X W 


2 


.67 


.33 


1.42 


Error 


174 


41.45 


.24 





.01 



ERIC 



7 



7. 

the finding*. First, ratings were collected in only one course taught by one 
Instructor, therefore the generalizability of the findings to other courses 
and instructors is limited. Second, both the overall ratings of the course 
and of the instructor were quite high. Effects due to the negative wording of 
the diagnostic items nay be more noticeable with less highly rated instructors 
and courses for which students have more to criticize. A second study was 
therefore conducted which was designed to remove these limitations. 

Method . The six evaluation forms used in Study One were randomly ad- 
ministered during the last week of the 1980 Fall semester in 14 sections of an 
undergraduate introductory sex education and family life course taught by 7 
Instructors. Of the 275 students responding, approximately 45 responded to 
each of the six fonts. 

Data Analysis . The same 2x3 ANOVA with repeated measures used In the 
initial investigation was conducted. 

Results . Means and standard deviations for the two global items record- 
ed on each of the six evaluation forms are presented in Table 3. ANOVA results 
presented in Table 4 Indicate significant (p < .01) differences between the 
overall ratings of thv course (3.62) and instructor (4.13), with the latter 
ratings being higher. No significant (p < .05) differences were identified 
for either global rating across wording or placement conditions. 

Discussion 

In neither study were the overall ratings of the Instructor or course 
affected by the negative or positive wording of the diagnostic items. In the 
first study, however, the placement of the diagnostic items influenced the 
global ratings o: ti*e course. The placement effects found In Study One 
indicated that studatecs rated the course, but not the Instructor, significant- 
ly lower when the global items were presented after rather than before the 

ERIC 8 



8. 



Table 3 

Study Two: Global Item Means and Standard 
Deviations Across the Six Evaluation Forms 



Placement Conditions 
Before Diagnostic Items After Diagnostic Items 



Wording 
Ccndltions 


Instructor 
Ratings 

X 


SD 


Course 
Ratings 
X 


SD 


Instructor 
Ratings 
X 


SD 


Course 
Ratings 
X 


SD 


(0/30 negatives) 


4.17 


.86 


3.54 


1.07 


4.23 


.96 


3.55 


1.50 


(10/30 negatives) 


4.19 


.83 


3.77 


.84 


3.91 


1.20 


3.47 


1.14 


(20/30 negatives) 


4.U 


.90 


3.73 


.82 


4.19 


.97 


3.71 


1.04 



Table 4 

Study Two: ANOVA Summary Table 



Source 


df 


SS 


MS 


F 


Placement (P) 


1 


.71 


.71 


.51 


Wording (W) 


2 


.93 


.47 


.34 


P.x W 


2 


3.23 


1.62 


1.16 


Error 


269 


374.16 


1.39 




Type of Rating (T) 


1 


35.24 


35.24 


48.43* 


T X P 


1 


.10 


.11 


•15 


T X W 


2 


1.49 


.75 


1.03 


T X P X 


2 


.03 


.01 


.02 


Error 


269 


195.74 


.73 





*p < .01 



ERIC 9 



9- 



diagnostic Items, An Informal observation of students during the adalnlet ra- 
tion of the evaluation forms Indicated that they responded to the last tvo 
global Items after responding to the diagnostic Items, Failure to find 
erasures of global Item responses also Indicated that students did not 
change their Initial global Item responses after completing the diagnostic 
items. These observations suggest that In responding to the diagnostic items 
firsts the students may have used them as a type of *'score card" for evaluating 
the overall quality of the Instructor and course » The diagnostic items 
indicated instructional areas which needed to be considered In the global 
evaluations. Respoudlng to the score card before making global assessments 
apparently lowered the students initial reactions to the course but not to the 
instructor. Possibly the students became more realistic in their assessments 
or were reminded of more weaknesses than strengths. Further research wherein 
students are asked to explain their responses to the evaluation forms is need- 
ed to find the correct explanation. 

Why responding first to the diagnostic items altered the course ratings 
only is also not clear. Research has found that students require less prompt- 
ing to discuss the strengths and weaknesses of an Inst motor than of a course 
(Braskamp, Ory, and Pleper, 1980) and are more consistent in their ratings of 
an instructor than of a course regardless of the evaluation method used; I.e., 
open-ended questionnaires, rating scales or group interviews (Ory, Braskamp, 
and Pieper, 1980). Students may therefore have a set opinion of the instructor 
already in mind and may not need the framework or prompting provided by the 
diagnostic items. On the other hand, the framework provided by the diagnostic 
items may help to narrow the range of areas needed to be considered when 
evaluating an entire course (i.e., exams, homework, lectures) and thus have 
greater isq>act on the ratings of the course. 



EKLC 



10 



10. 

More Inportantly Chan the reesonCs) for the placement effects found In 
Study Oae» la the fact that auch effecta were not evident In the more externally 
valid aecond atudy. Neither wording nor placement effects were Identified in 
the global ratings of 7 instructors teaching 14 course sections. Results of 
Study Two confirm the initial study's failure to find significant wording 
effecta bat fail to sxipport the existence of the placement effects* Instead, 
the lack of findings in Study Two which would indicate possible sources of 
rating influence spcjka well for the validity of student ratings. Students 
appear to have a general opinion about their Instructor and course that is 
unaltered by the placement or wording of other scale items Included on the 
evaluation form. 

With the current increase In college and university use of student 
evaluations of instruction, the reliability and validity of student ratings 
is consistently being challenged. Mumeroxis research studies (Aleamoni 
and Graham, 1974; Brandenburg, Sllnde, and Batista, 1977; Frey, Leonard, and 
Beatty, 1975; Marsh, 1980; McKeachle and Lin, 1971) have Investigated the 
extent to which extraneous variables (i.e., expected grade, class size, sex, 
timing of administration) bias the measurement of teacher and course quality. 
Results of these two studies add one (wording of diagnostic items) and 
possibly two (placement of diagnostic items) more extraneous variables to a 
growing list of factors which have little, if any. Influence on global assess- 
ments. 




11 



u. 

Refermc^ Notea 

1. lUinols Course Evaluation System: Its rationale and description (ICES 
Newsletter No. 2). Urbana-Chaapaign: University of Illinois, Measureaent 
and Research Division, Office of Instructional Resources, August (1977). 
(Mineo). 

References 

Aleawni, L. M. & Graham, M. H. The relationship between CEQ ratings and 

instructor's rank, class size, and course level. Journal of Educational 
Heasurement > 1974, 11, 189-202. 

Brandenburg^ D. C. , Braskamp, L. A. & Ory, J. C. Considerations for an evalua- 
tion program of instructional quality* CEDR Quarterly , 1979, 12, 8-13. 

Brandenburg, D. C, Sllnde, J. A. & Batista, E. E. Student ratings of 

instruction: Validity and normative Interpretations. Journal of Re- 
search in Higher Education ^ 1977, 7^, 67-78. 

Braskamp, L« A., Ory, J. C, & Pieper, D. M. Written comments: Dimensions 
of instructional quality. Journal of Educational Psychology > in press. 

Couch, A. & Keniston, K. Yea sayers and nay sayers: Agreeing response set as 
a personality variable. Journal of Abnormal and Social Psychology , 1960, 
60, 151-174. 

Cronbach, L» J. Response sets and test validity. Educational and Psychological 

Measurement > 1946, ^, 475-494. 
Cronbach, L« J. Further evidence on response sets and test design. Educational 

and Psychological Heasurement t 1950, 10, 3-31. 
Edvards, A. L. Techniques of Attitude Scale Construction. New York: 

Appleton Century Crofts, 1957. 



ERIC 



12 



12. 



Prey, P. W., Leonard, D. W., & Beatty, W. W. Student ratings of Instruction: 
Validation research. American Educational Research Journal , 1975, 12, 
435-44? 

Lemon, N. Attitudes and their Measurement . John Wiley & Sons, New York, 
1973. 

Llkert, R. A technique for the measurement of attitudes. Archives of 

Psychology , 1932, 140 , 44-53. 
Marsh, H. W. The Influence of student, course, and Instructor characteristics 

on evaluations of university teaching. American Educational Research 

Journal , 1980, 17, 219-237. 
McKeachie, W. J. & Lin, Y. Sex differences In student response to college 

teachers: Teacher warmth and teacher sex. American Educational Research 

Journal , 1971, 8, 22-226. 
Oty, J. C, Braskamp, L. A., & Pleper, D. M. Congruency of student evaluative 

information collected by three methods. Journal of Educational 

Psychology , 1980, 72, 181-185. 



ERIC 



13 



